A Assessing Relevance and Trust of the Deep Web Sources and Results Based on Inter-Source Agreement1
نویسندگان
چکیده
Deep web search engines face the formidable challenge of retrieving high quality results from the vast collection of searchable databases. Deep web search is a two step process of selecting the high quality sources and ranking the results from the selected sources. Though there are existing methods for both the steps, they assess the relevance of the sources and the results using the query-result similarity. When applied to the deep web these methods have two deficiencies. First is that they are agnostic to the correctness (trustworthiness) of the results. Secondly, the query based relevance does not consider the importance of the results and sources. These two considerations are essential for the deep web and open collections in general. Since a number of deep web sources provide answers to any query, we conjuncture that the agreements between these answers are helpful in assessing the importance and the trustworthiness of the sources and the results. For assessing source quality, we compute the agreement between the sources as the agreement of the answers returned. While computing the agreement, we also measure and compensate for the possible collusion between the sources. This adjusted agreement is modeled as a graph with sources at the vertices. On this agreement graph, a quality score of a source that we call SourceRank, is calculated as the stationary visit probability of a random walk. For ranking results, we analyze the second order agreement between the results. Further extending SourceRank to multi-domain search, we propose a source ranking sensitive to the query domains. Multiple domain specific rankings of a source are computed, and these ranks are combined for the final ranking. We perform extensive evaluations on online and hundreds of Google Base sources spanning across domains. The proposed result and source rankings are implemented in the deep web search engine Factal. We demonstrate that the agreement analysis tracks source corruption. Further, our relevance evaluations show that our methods improve precision significantly over Google Base and the other baseline methods. The result ranking and the domain specific source ranking are evaluated separately.
منابع مشابه
Exploring Relevance as Truth Criterion on the Web and Classifying Claims in Belief Levels
The Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this paper...
متن کاملAssessing the Internal Structure of the Ellis Information Retrieval Model in Order to Present the Persian Norm of Web Retrieval Tools
Introduction: Study evaluated the internal structure of Ellis information seeking model in the student community with the aim of presenting the Persian norm. Methods: This is a descriptive-analytical study conducted by cross-sectional survey method in the second semester of the academic year 1399-1400. Population comprise of 280 graduate students at Ahvaz Jundishapur University of Medical Scien...
متن کاملTopic Sensitive SourceRank: Extending SourceRank for Performing Context-Sensitive Search over Deep Web by
Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based local measures for assessing source quality. These local measures are necessary but not sufficient as they are agnostic ...
متن کاملA confidence-aware interval-based trust model
It is a common and useful task in a web of trust to evaluate the trust value between two nodes using intermediate nodes. This technique is widely used when the source node has no experience of direct interaction with the target node, or the direct trust is not reliable enough by itself. If trust is used to support decision-making, it is important to have not only an accurate estimate of trust, ...
متن کاملAgreement Based Source Selection for the Multi-Domain Deep Web Integration
One immediate challenge in searching the deep web databases is source selection—i.e. selecting the most relevant web databases for answering a given query. For open collections like the deep web, the source selection must be sensitive to trustworthiness and importance of sources. Recent advances solve these problems for a single topic deep web search adapting an agreement based approach (c.f. S...
متن کامل